Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
نویسندگان
چکیده
Manetho is a new transparent rollback recovery protocol for long running distributed computations It uses a novel combination of antecedence graph maintenance unco ordinated checkpointing and sender based message logging Manetho simultaneously achieves the advantages of pessimistic message logging namely limited rollback and fast output commit and the advantage of optimistic message logging namely low failure free overhead These advantages come at the expense of a complex recovery scheme Index Terms Antecedence graph checkpointing message logging rollback recovery transparent fault tolerance
منابع مشابه
Implementation and Performance of Transparent Rollback-recovery in Manetho
We describe the implementation and performance of rollback-recovery in Manetho. During failure-free operation, Manetho maintains an antecedence graph which records the \happened before" relation between certain events in the distributed computation. The antecedence graph is used in combination with checkpointing and volatile sender-based message logging to simultaneously achieve low failure-fre...
متن کاملEfficient Transparent Optimistic Rollback Recovery for Distributed Application Programs
Existing rollback-recovery methods using consistent checkpointing may cause high overhead for applications that frequently send output to the “outside world,” since a new consistent checkpoint must be written before the output can be committed, whereas existing methods using optimistic message logging may cause large delays in committing output, since processes may buffer received messages arbi...
متن کاملSurvey of Backward Error Recovery Techniques for Multicomputers Based on Checkpointing and Rollback
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpointing and rollback, is often used. During failurefree operation, the process states are regularly saved, and after a fault is detected, the system is rolled back to a previously saved state. We can distinguish four classes of techniques: semi-automatic techniques, message logging, coordinated ch...
متن کاملLow-cost Checkpointing-based Rollback Recovery Algorithm Considering Scalability
In this paper, we design a low-cost checkpointing-based rollback recovery algorithm to address the traditional scalability problem of synchronous checkpointing in the completely different point of view compared with existing ones. This algorithm enables a cluster-wide set of processes to take their semi-global checkpointing procedure while a small set of cluster heads monitor local commit of th...
متن کاملAn Application-Transparent, Platform-Independent Approach to Rollback-Recovery for Mobile Agent Systems
This paper proposes a new approach to rollback-recovery for mobile-agent systems, and describes its implementation in the MESSENGERS mobile agents system. The used checkpointing method allows to implement space and time efficient, user-transparent rollback-recovery in heterogeneous distributed environments. Together with an efficient non-blocking system snapshot algorithm this checkpointing met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Computers
دوره 41 شماره
صفحات -
تاریخ انتشار 1992